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C/3 ! Abstract 
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(T) I This contribution is based on the contents of a talk delivered at the Next-SigmaPhi conference 

held in Crete in August 2005. It is adressed to an audience of physicists with diverse horizons and 
does not assume any background in communications theory. Capacity approaching error correcting 
codes for channel communication known as Low Density Parity Check (LDPC) codes have attracted 
considerable attention from coding theorists in the last decade. Surprisingly strong connections 
with the theory of diluted spin glasses have been discovered. In this work we elucidate one new 
connection, namely that a class of correlation inequalities valid for gaussian spin glasses can be 
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I applied to the theoretical analysis of LDPC codes. This allows for a rigorous comparison between 



the so called (optimal) maximum a posteriori and the computationaly efficient belief propagation 
decoders. The main ideas of the proofs are explained and we refer to recent works for the more 



lengthy technical details. 
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I. CODES FOR COMMUNICATION THROUGH NOISY CHANNELS 



We consider a (simplified) communication system with three basic building blocks: the 
encoder, the channel and the decoder. 

Encoder. Suppose that messages to be sent are labelled {1, M} and that M = 2 K . The 
messages can be represented by binary strings of length K, so that if a message is sent K 
information bits are transmitted. Because of channel imperfections these binary strings are 
encoded before they are fed into the channel. In general the encoder is a map Ff — > F^, 
with F 2 = {0, 1} and N > K. So the codebook consists of 2 K code words that are binary 
strings of length N, (x l5 ...,%) = x. In order to send K information bits we make N uses 
of the channel: one says that the rate of transmission is R = 

Channel. We take a discrete (binary input) memoryless channel with general output al- 
phabet (for example F 2 or R). Given a sent codeword (xi, Xjv) the received word is 
(yi, ...jUn) = y with probability £>Y|x(y| x ) = YliLi PY\x(Ui\xi) 111 this context the choice of 
the transition probability py\x specifies the model for the channel and is supposed to be 
known to the sender and the receiver. 

Decoder. Given that x m is sent, the receiver possesses a deformed version y (the channel 
observations or the channel output) and his task is to find estimates D(y) so that the bit 
probability of error P er ror{{D(y))i ^ x* n ) is as small as possible. One can show that the best 
decoder (the one which gives the smallest probability of error) is given by the Maximum 
a Posteriori (MAP) estimator (x^map = argmax x .px|Y(a ; j|y) Unfortunately this cannot 
be computed efficiently and other suboptimal estimators must be considered. Of course it 
is important to compare their relationship and performance to the MAP estimator. This 
problem is adressed here for LDPC codes and the suboptimal estimator given by Belief 
Propagation (BP). 

Shannon's noisy channel coding theorem asserts that one can communicate reliably as 
long as the rate R is smaller than the channel capacity C = max Px /(X; 1"). In this for- 
mula I(X; Y) is the mutual information between random variables X and Y which can 
be interpreted as the information gained about X given that Y is observed. The maxi- 
mization over the prior distribution of the codewords px corresponds to finding the best 
possible codebook. In formulas, I(X; Y) = H(X) - H(X\Y) = H(Y) - H(Y\X), where 
the Shannon entropy of X is H{X) = — ^2 x Px(x) \npx(x) and the conditional entropy 
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H(X\Y) = —J2xyPy(y)Px\Y(x\y)\npx\Y( x \y) and similarly for X and Y exchanged. All 
marginals are computed from px,y(x, y) = px(x)PY|x(y| x )- Thus C is a functional of the 
channel transition probability. Moreover there is no way to communicate reliably when 
R>C. 

More precisely, let R < C — e where e > is as small as we wish. There exists an N (e) 
such that for each N > N (e) we can find encoding and decoding maps (the optimal decoder 
does the job) such that P err or < e - Conversely, if R > C — e for any iV and any encoding 
map P er ror > Po > for some p Q independent of N. 

For our purpose it is more convenient to fix a desired rate R once for all and translate 
the inequality R < C as a condition on the channel noise n < n s h where n s h is a (channel 
dependent) function of R. This means we can reliably transmit at rate R as long as the 
channel noise is lower than the Shannon threshold n s h- 

Shannon's theorem is not constructive in the sense that it garantees the existence of an 
encoder in an ensemble of random codes, but does not allow to construct "good" (capacity 
approaching and computationaly efficient) encoders and decoders. One of the main themes 
of information and coding theory for the last fifty years has been to precisely define and 
address such questions. A fruitful idea is to restrict the encoder maps to the class of linear 
error correcting codes. Remarkably Shannon's theorem is still true if one restricts to the 
class of linear encoders and there is no loss in capacity. For more details we refer the reader 
to [J. 

For us a linear code is a vector subspace of of dimension K < N . The subspace can 
be defined as the kernel of a parity check M x N matrix H with N — M = K. In other 
words the set of code words satisfy M constraints (so called parity checks) 

£ H lk x k = modi, I = 1, M, H lk = 0, 1 (1) 
fe=i 

Note that the rate of the code is i? = ^ = 1 — M?. A very useful graphical representation 
of a linear code is in terms of the Tanner graph (or factor graph). This is a bipartite graph 
with variable nodes i G {1, iV}, check nodes A e {1, ...M}, and edges connecting variable 
and check nodes. We say that a variable node % "belongs" to a check node A, i e A, if 
and only if it appears in the parity check equation labeled by A. In this case an edge 
connects i and A (see figure 1). Low Density Parity Check (LDPC) codes are a special class 
of linear codes with sparse Tanner graphs: the degrees (or coordination number) of check 
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FIG. 1: A Tanner graph. Check nodes on the top row constrain the bits attached to variable 
nodes on the bottom row 



and variable nodes are of 0(1) with respect to N. For such codes there is still a threshold 
phenomenon as in Shannon's theorem however in general the maximal rate at which error 
free communication is possible is below Shannon's capacity. On the other hand suboptimal 
but computationaly efficient decoding algorithms exist. 



II. LOW DENSITY PARITY CHECK CODES AS DILUTED SPIN GLASSES 

The close connection of the above formalism to random spin systems was first noticed 
by Sourlas Q]. While this connection is quite general and not limited to binary alphabets, 
memoryless channels and linear codes, here we rephrase it in the case of low density parity 
check codes. If code word bits are represented by spins through the mapping Sj = (— l) Xi , 
the parity check equations (JTJ) become 

^(l + s A ) = l, s A = H Si , A=1,...,M (2) 

The a posteriori probability distribution used in MAP decoding is nothing else than the 
Gibbs measure of a spin system where the spins are attached to variable nodes while check 
nodes are a convenient way to represent their many-body interactions. By Bayes rule 



!c(x) Y\ZiPY\x(yi 
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PX|Y(x|y) = ^icwn^few (3) 



This is a Gibbs Measure (— )c = e z ° W1 fh hamiltonian 



z c 

n 

H c = - 7^ Ja{sa - 1) - hjSj, s A = Y\_ s i ( 4 ) 



AeC i=l ieA 



where J a = +oo and hi = ~ In P ^y\X ■ The channel observations enter through a quenched 
random magnetic field hi whose distribution is induced by the distribution of channel ob- 
servations. It can be shown that for symmetric channels (these satisfy p(y\x) = p(—y\ — x)) 



there is no loss in generality to assume that the input word is (X™ = 0,...,x^ = 0), so 
that the distribution of channel observations is ]^[^ =1 1 0) . Another source of quenched 
randomness is given by the Tanner graph (defining the coupling constants J a) which is 
taken from an ensemble of random graphs. Since our results are independent of the choice 
of this ensemble we do not discuss their construction in detail. Let us point out that the 
performance of a particular coding scheme depends on the choice of the ensemble. The 
expectation value with respect to the channel observations and the graphs are denoted Ec^. 
The MAP decoding rule becomes 

{si)MAP = sign(si) c (5) 

and the average bit probability of error for the optimal decoder is basically the overlap of 
(si, sn) with the fully ferromagnetic configuration (1, 1) (or the sent codeword) 

1 N 

Perror = }E C)h [l ~ sign(Si) C ] (6) 
i=l 

The replica or cavity methods can be applied to the calculation of such quantities and show 
that a phase transition occurs Q|. Namely there is a threshold jimap such that for n < timap 
the probability of error goes to zero in the thermodynamic limit (ferromagnetic phase), while 
for n > n-MAP the probability of error is bounded away from zero. Sparse graphs are localy 
tree like in the sense that the typical size of loops is O(N) and have no boundary. Hence it is 
reasonable to expect that mean field approaches such as the replica or cavity methods yield 
exact results. This is for the moment unproven although some progress in this direction 
has been made by the use of interpolation methods 
correlation inequalities yields closely related results. 



lough i 

3, B. 



As explained below our use of 



III. EFFICIENT DECODING 



Although one can optimize the degrees of the Tanner graphs in order that umap ap- 
proaches n s h, MAP decoding is computationally too expensive. However one can take ad- 
vantage of the fact that low density graphs are localy tree like (see figure 2). Consider a 
specified root node o and its neighborhood of depth d. As long as d = 0(1) with respect to 
N this neighborhood is a tree with high probability. Thus one can expect that a good ap- 
proximation is obtained by neglecting the loops and solving for the magnetization of the spin 
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FIG. 2: Tree like neighborhood T a of an arbitary root node o. The loops are of size 0(N) with 
high probability 

system on a tree. The sign of the magnetization on the tree defines the Belief Propagation 
(BP) estimate 

{s )bp = signtanh(/i + ^ u cZo) ( 7 ) 

ceo 

In this formula the fields uc^i are computed from the iterative procedure 

^ = tanh" 1 J] tanh = £ (8) 

jeC\i C<=V(i)\A 

with the initial conditions hf\ c = hi. 

The belief propagation decoding algorithm is an iteration based on these exchanges of 
messages uc-*i from checks to variables and messages hj^c from variables to checks. It is 
applied to the full Tanner graph and despite the presence of loops it converges and succesfully 
decodes for n < ubp- The relationship between the various thresholds is ubp < umap < n s h- 
It should be clear that this algorithm is closely related to the cavity equations of spin glass 
theory. 

One of the main problems in the theory of LDPC codes is to optimize the codes so that 
the various thresholds come as close as posible to n s h- A more basic problem is to compare 
the error probabilities given by the BP and MAP decoders. While this is difficult in general 
we show below how these decoders can be compared for closely related quantities - the 
generalized EXIT curves - through the use of correlation inequalities. 

IV. CORRELATION INEQUALITIES 

Here we restrict ourselves to the case of the binary input additive white gaussian noise 
channel (BIAWGNC) where the results are more transparent. Mathematicaly the channel is 
defined as yt = Xi+Wi, W{ i.i.d jV(0, n). Then the log-likelihood ratio (or magnetic field) hi = 
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FIG. 3: A pictorial representation of the check erasing inequality 

\ In has a gaussian distribution with equal mean and variance E^jTij] = Vh[hi] = n~ l l 2 . 
We soften the parity check constraints from J a = +00 to independent gaussian random 
variables with equal mean and variance Ej[J^] = Vj[Ja] = ^A- The case of hard constraints 
(the parity checks) is recovered by making t& — * +00. With solft random constraints the 
hamiltonian is a gaussian spin glass with Nishimori gauge symmetry. Contucci, Graffi and 
Nishimori proved for such systems the following set of inequalities hold [8( 

Ej[( 8x ))>0, A Ej [( Sx )]>0, any X, Y C {1, N} (9) 

Oty 

The reader will recognize the close similarity to the famous Griffith-Kelly-Sherman correla- 
tion inequalities valid for fully ferromagnetic systems. 

This inequality can be applied to compare the magnetization on the initial Tanner graph 
and on a tree graph. In the coding context this allows a comparison between MAP and BP 
decoders. Consider the Gibbs measure defined by the gaussian spin glass hamiltonian with 
some set of variances t^, A = 1, M. The neighborhood T Q of o (see figure 3) is a tree with 
probability (1 — 0(jt)) where A; is a constant related to the maximal degree of the nodes. 
The second correlation inequality implies that, if for the checks outside of T Q we decrease 
tji to zero, the average magnetization of site o decreases. This inequality is preserved if we 
increase £4 to infinity for the checks inside T . In other words 

Ec,h[(so)c] > Ec,h[(s )T \To is a tree]Pr(T is a tree) (10) 

The right hand side should also incude a contribution coming from the probability that T 
is not a tree but by the first correlation inequality it is positive so that we can omit it. We 
refer to this procedure as the " check erasing" (see figure 3 for a pictorial illustration of check 
erasing). On the tree graph the statistical mechanical sums can be performed exactly and 
yield in a natural way the Belief Propagation algorithm of the previous section. So 

VcA(*o)c] > (Qbp(1 - O(^)) (11) 
7 



Finaly one can take the thermodynamic limit iV — > +00 and then the limit d — > +00. While 
on the right hand side these limits can be shown to exist, the existence of the thermodynamic 
limit for the left hand side is an open problem. Thus we realy take the liminfTv-^oo- 



V. GENERALIZED EXIT CURVES 

The probability of error (JHJ) is technically cumbersome to handle. Another quantity called 
in coding theory the "extrinsic information transfer" is more convenient to study. It yields 
the same thresholds as the error probability and from the satistical mechanical perspective 
it is much more natural as will become clear below. Here we define the generalized EXIT 



curve associated to MAP decoding as 
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g MA p(n)=]hnM^E c [H(X 1 ,...,X N \Y 1 ,...,Y N )] (12) 
N-+00 iv an 

The conditional entropy of the a posteriori distribution is nothing else than the average 
entropy of the Gibbs distribution for the spin glass. It should not come as a surprise that 
this can be related to the free energy 

N 

E C [#(X|Y)] = E Cth [lnZ c ] -J2 E cA h i( s i)c] (13) 

i=i 

In the case of a BIAWGNC the derivative with respect to the noise has a simple relation 
to the magnetization. This is not obvious a priori because the channel noise does not 
enter like an external field and for more general channels the corresponding relation is more 
complicated. The derivation of (fT3|) . (fTljl is too lengthy to show here but let us note that 







the main point is to use Nishimori identities 

1 N I 
9MAp{n) = liminf TTT^Z^^A 1 ~ ( 8 *)c\ = TT^^A 1 ~ ( s o)c], any o (14) 

i=l 

The following lemma shows that gMAp( n ) an d Perror have the same threshold. 

Lemma. Assume communication through a BIAWGNC with noise n and an ensemble of 
linear codes. We have that gMAp(n) = if and only if P e = linijv^ +00 P error = 0. 

To show that gMAp( n ) — implies P e = we note that if 1 = Ec,/i[( s o)e] then E Cjfe [(s )^] — 
Ec,h[(s )c] 2 = because of the Nishimori identity Ec,/ l [(s )c] = ^c,h[{ s o)c}- Thus the random 
variable (s Q )c does not fluctuate and equals 1 almost surely. Thus sign(s G )c = +1 and 
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P e = 0. For the converse we combine Fano's inequality j]J together with Jensen to get < 
■kH(X.\Y) > h(P error ) where h is the binary entropy function. Thus limjv^ +00 jhH (X|Y) = 
0. If this is true for a whole range of n we can conclude gMAp(n) = 0. 
Combining (jE) and (HJ) we obtain Q, Q 

Theorem. Assume communication through a BIAWGNC with noise n and an LDPC en- 
semble of codes. Then 

i 



gMAp{n) > lim ^E/w u , 



1 - tanh(/i + V" 



c=l 



(15) 



where the right hand side is computed from the BP algorithm and defines the generalized 
EXIT curve associated to the BP decoder, gBp{n). The p.d.fofh is gaussian with mean and 
variance n~ x l 2 , I is the random degree of variable nodes, the distribution of Ui is induced by 
the message passing algorithm. 

Such bounds and the method used here extends to the class of (smooth) binary input 
symmetric channels [IS | . These bounds have also been derived recently by the method of 
physical degradation (5 , [l^. To conclude we briefly discuss a number of consequences of 
the theorem. 

General picture. In general the BP and MAP curves may have several discontinuites cor- 
responding to several phase transitions in the spin glass. In the simplest (non trivial) 
case where there is only one discontinuity their behavior is as follows. For < n < risp 
gp.p{n) = 0, there is a jump discontinuity at ubp and for n > ubp gBp{n) is strictly posi- 
tive. The same occurs for gMAP but with the jump discontinuity at u M ap and u M ap > ubp- 
Moreover the BP curve is always under the MAP curve. 

Bound on MAP threshold. From the definition of the MAP generalized EXIT curve we see 
that 

r+oo i 

/ 9MAp(n)dn = liminf -(# (X|Y)| n=+00 - H(X\Y)\ n=nMAP ) = R (16) 

l„ N—*oo iV 

Indeed for infinite noise we have no knowledge of the sent signal (so the conditional entropy 
is R) and just below the MAP threshold we have perfect knowledge (the conditional entropy 
is zero). The theorem then implies 

r+oo 

R< g B p{n)dn (17) 

Jn M AP 
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where the rigth hand side can be computed numericaly. This then yields a lower bound 
on the MAP threshold. Numerical evaluations tend to show that this bound is tight which 
suggests that above the MAP threshold the BP and MAP curves should coincide l^J (this 
can be proved for the binary erasure channels and some codes [? ]). 

Bounds on the conditional entropy. It is possible to obtain bounds on the conditional entropy 
itself by integration of the inequality (fT5j). Let us set h(X.\Y) = liminfAr^ +00 j^H(X\Y). 
Integrating from to n we get 

rn 

h{X\Y)< / g BP (n)dn (18) 
J o 

and integrating from n to +oo, 

h(X\Y)>R- g BP (n)dn= / g BP (n)dn+(R- I g BP (n)dn) (19) 

Jn JO JO 

In the case where there is no phase transition one can show that R = J + °° dng B p{n) so that 
we get an exact expression for the conditional entropy and its derivative satisfies gMAp{n>) = 
9Bp{ n )- We have a situation where the model is exactly solved and the result of the cavity 
method (or replica symmetric expression) is proved to be exact. However there is no fully 
polarized phase and no error free communication. When there is one (or many) phase 
transition the parenthesis in the last the right hand side of ()19j) is strictly negative so 
that the two bounds for h(X\Y) do not match. However it is believed that the upper 
bound ([18)1 is tight above the MAP threshold because it coincides with the result of the 
replica symmetric calculation. The same bound has been obtained [2J using the interpolation 
methods developped by Guerra jf| for the Sherrington-Kirkpatrick model. Clearly, it would 
be desirable to prove the converse inequality. 
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